Picture for Yali Du

Yali Du

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?

Add code
Jan 28, 2026
Viaarxiv icon

Social World Model-Augmented Mechanism Design Policy Learning

Add code
Oct 22, 2025
Viaarxiv icon

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory

Add code
Aug 12, 2025
Viaarxiv icon

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Add code
May 21, 2025
Viaarxiv icon

Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks

Add code
May 20, 2025
Viaarxiv icon

Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation

Add code
Mar 28, 2025
Figure 1 for Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation
Figure 2 for Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation
Figure 3 for Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation
Figure 4 for Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation
Viaarxiv icon

SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Add code
Mar 18, 2025
Viaarxiv icon

GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models

Add code
Mar 12, 2025
Viaarxiv icon

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Add code
Mar 06, 2025
Figure 1 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 2 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 3 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 4 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Viaarxiv icon